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Abstract. Autonomous critical systems, such as satellites and space rovers, must be 
able to detect the occurrence of faults in order to ensure correct operation. This task is 
carried out by Fault Detection and Identification (FDI) components, that are embedded 
in those systems and are in charge of detecting faults in an automated and timely manner 
by reading data from sensors and triggering predefined alarms. 

The design of effective FDI components is an extremely hard problem, also due to 
the lack of a complete theoretical foundation, and of precise specification and validation 
techniques. 

In this paper, we present the first formal approach to the design of FDI components 
for discrete event systems, both in a synchronous and asynchronous setting. We propose 
a logical language for the specification of FDI requirements that accounts for a wide class 
of practical cases, and includes novel aspects such as maximality and trace-diagnosability. 
The language is equipped with a clear semantics based on temporal epistemic logic, and 
is proved to enjoy suitable properties. We discuss how to validate the requirements and 
how to verify that a given FDI component satisfies them. We propose an algorithm for 
the synthesis of correct-by-construction FDI components, and report on the applicability 
of the design approach on an industrial case-study coming from aerospace. 


1. Introduction 

The operation of complex critical systems (e.g., trains, satellites, cars) increasingly relies 
on the ability to detect when and which faults occur during operation. This function, 
called Fault Detection and Identification (FDI), provides information that is vital to drive 
the containment of faults and their recovery. This is especially true for fail-operational 
systems, where the occurrence of faults should not compromise the ability to carry on critical 
functions, as opposed to fail-safe systems, where faults are typically handled by going to a 
safe state. FDI is often carried out by dedicated modules, called FDI components, running 
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in parallel with the system. An FDI component, hereafter also referred to as a diagnoser, 
processes sequences of observations, made available by predefined sensors, and is required 
to trigger a set of predefined alarms in a timely and accurate manner. The alarms are then 
used by recovery modules to guarantee the survival of the system without requiring external 
control. Faults are often not directly observable. Their occurrence can only be inferred by 
observing the effects that they have on the observable parts of the system. Moreover, faults 
may have complex dynamics, and may interact with each other in complex ways. 

For these reasons, the design of FDI components is a very challenging task, and also a 
practical problem, as witnessed by multiple Invitations To Tender issued by the European 
Space Agency [EurlOl lEurlll lEur!3j . The current methodologies lack a comprehensive 
theoretical foundation, and do not provide clear and effective specification and validation 
techniques and tools. Most approaches asses the quality of an FDI component based on 
simulation and quantitative analysis [FKN + 10], that do not start from a specification of 
the behavior the the FDI needs to satisfy. This leads to a uniform treatment of all faults, 
while in general some faults are more important then others, and in many cases we are not 
interested in the specific fault characteristics but only to know that the fault occurred in 
a given part of the system (isolation). As a consequence, the design often results in very 
conservative assumptions, so that the overall system features sub-optimal behaviors, and it 
is not trusted during critical phases. 

The goal of this paper is to propose a formal foundation to support the design of 
FDI components. We provide a way to specify FDI components, and cover the following 
problems: (i) validation of an FDI component specification, (ii) verification of a given FDI 
component with respect to a given specification, and (iii) automated synthesis of an FDI 
component from a given specification. 

The specification of an FDI component is tackled by introducing a pattern-based, lan¬ 
guage. Intuitively, an FDI component is specified by stating the observable signals (the 
inputs of the FDI component), the desired alarms (in terms of the unobservable state), and 
by defining the relation between the two. The language supports various forms of delay (ex¬ 
act, finite, bounded) between the occurrence of faults and the raising of the corresponding 
alarm. The patterns are given a formal semantics expressed in terms of epistemic temporal 
logic |HV89j . where the knowledge operator is used to express the certainty of a condition, 
based on the available observations. The formalization encodes properties such as alarm 
correctness and alarm completeness. Correctness states that whenever an alarm is raised by 
the FDI component, then its associated triggering condition did occur; completeness states 
that if an alarm is not raised, then either the associated condition did not occur, or it would 
have been impossible to detect it, given the available observations. Moreover, we precisely 
characterize two aspects that are important for the specification of FDI requirements. The 
first one is the diagnosability of the plant, i.e., whether the sensors convey enough informa¬ 
tion to detect the required conditions. We explain how to deal with non-diagnosable plants 
by introducing a more fine-grained concept of trace diagnosability, where diagnosability is 
localized to individual traces. Most of the state of the art focuses on the fact that the system 
is diagnosable for any execution. However, in practice, this is rarely the case, since usually 
the plant is diagnosable in many situations but not in all of them. The classic example is 
the one of a burnt light-bulb, of which we cannot say anything until we try to turn it on. 
In this case, we would like to build a diagnoser that can raise the alarm whenever there 
is no ambiguity on whether the light bulb is burnt. Therefore, we introduce the concept 
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of trace diagnosability, intuitively accepting the fact that the plant might not always be 
diagnosable. 

The second important concept that we introduce is maximality. A diagnoser is maximal 
if it is able to raise an alarm as soon as and whenever possible. This, in particular, means 
that in all traces that are diagnosable, a maximal diagnoser needs to raise the alarm. 

The approach provides a full account of synchronous and asynchronous perfect-recall 
semantics for the epistemic operator. We show that the specification language correctly 
captures the formal semantics and we clearly define the relation between diagnosability, 
maximality and correctness. 

Within our setting, the validation of a diagnoser specification is reduced to validity 
checking in temporal epistemic logic, while the verification of a given diagnoser is mapped 
to model checking for a temporal epistemic logic. As for synthesis, we propose an algorithm 
that is proved to generate correct-by-construction diagnosers. 

From the practical standpoint, the applicability of the design approach has been demon¬ 
strated on two projects funded by the European Space Agency [AUTllFAM) . The pape r ac¬ 
tually provides the conceptual foundation underlying a design tool-set [ANY + 12[ IBBC + 14a . 
llBBC + 14b] . which has been applied to the specification, verification and synthesis of an FDI 
component for a satellite. 

Finally, please note the deep difference between the design of FDI components and most 
diagnosis ldKK04l approaches. In most settings, diagnosis systems can benefit from powerful 
computing platforms. Partial diagnoses are typically acceptable, and can be complemented 
by further (post-mortem) inspections. This is typical of approaches that rely on logical 
reasoning engines (e.g., SAT solvers [GARK07] ). Other approaches |HD05( ISSL + 95[ ISchf)4] 
rely on knowledge compilation to reduce the on-line complexity. An FDI component, on 
the contrary, runs on-board (as part of the on-line control strategy), and is subject to 
restrictions of various nature, such as timing and computation power. FDI design thus 
requires a deeper theory, which accounts for the issues of delay in raising the alarms, trace 
diagnosability, and maximality. Moreover, it becomes crucial to be able to verify and certify 
the effectiveness of the system, since it might not be possible to change it after deployment. 

This paper is structured as follows. Section [2] provides some introductory background 
and introduces our running example. Section [3] formalizes the notion of FDI. Section [4] 
presents the specification language. In Section [5l we discuss how to validate the require¬ 
ments, and how to verify an FDI component with respect to the requirements. In Section [6j 
we present an algorithm for the synthesis of correct-by-construction FDI components. The 
results of evaluating our approach in an industrial setting are presented in Section |T1 Sec¬ 
tion [8] compares our work with previous related works. In Section [91 we draw some conclu¬ 
sions and outline the directions for future work. 

2. Background 

2.1. Labeled Transition Systems. In order to model the plant and the FDI, we use a 
symbolic representation of Labeled Transition Systems (LTS). Control locations and data 
are represented by variables, while sets of states and transitions are represented by formulas, 
and transitions are labeled with explicit events. 

Given a set of variables X and a (finite) domain U of values, an assignment to A is a 
mapping from the set X to the set LA. We use S(A) to denote the set of assignments to X. 
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Given an assignment a € E(X) and X\ C X , we use a\x 1 to denote the projection of a over 
Xj. We use T{X) to denote the set of propositional formulas over X. 

Definition 2.1 (LTS). A Labeled Transition System is a tuple S = ( V,E,I,T ), where: 

• P is the set of state variables; 

• E is the set of events; 

• I G T{V) is a formula over P defining the initial states; 

• T : E —> J-(V U V') maps an event e £ E to a formula over P and P' defining the 
transition relation for e (with V' being the next version of the state variables). 

A state s is an assignment to the state variables P (i.e., s € E(P)). We denote by s' the 
corresponding assignment to V'. A transition labeled with e is a pair of states (s, s') such 
that s, s' \= Tie). A trace of S' is a sequence a = so, eo, si, e\, s 2 , • • • alternating states and 
event such that so satisfies I and, for each k > 0, (sk,Sk+i) satisfies T(e/-). Note that we 
consider infinite traces only, and w.l.o.g. we assume the system to be dead-lock free. Given 
cr = so, eo, si, ei, S 2 , • • • and an integer k > 0, we denote by a k the finite prefix so, eo,..., s^, 
of a containing the first k + 1 states. We denote by a[k] the k + 1-th state s* ; . We say that 
s is reachable in S iff there exists a trace a of S such that s = a[k] for some k > 0. 

We say that S is deterministic iff: 

(i) there is one initial state (i.e., there exists a state s such that s \= I and, for all t. if 
t \= I, then s = i); 

(ii) for every reachable state s, for every event e, there is one successor (i.e., there exists s' 
such that (s,s r ) \= T(e) and, for all t', if ( s,t') |= T(e), then s' = t'). 

Definition 2.2 (Synchronous Product). Let 

S 1 = (P 1 ,£ 1 ,I 1 ,T 1 ) and S 2 = (V 2 , E 2 ,1 2 , T 2 ) 

be two transition systems with E 1 = E 2 = E. We define the synchronous product S 1 x S 2 as 
the transition system {V 1 UP 2 ,!?,/ 1 A I 2 , T) where, for every e G E, T(e) = T l [e) AT 2 (e). 
Every state s of S' 1 x S 2 can be considered as the product s 1 x s 2 such that s 1 = smi is 
a state of S 1 and s 2 = s \ V 2 is a state of S 2 . Similarly, every trace a of S' 1 x S 2 can be 
considered as the product a 1 x a 2 where a 1 is a trace of S 1 and a 2 is a trace of S' 2 . 

Definition 2.3 (Asynchronous Product). Let 

S 1 = (F 1 ,/; 1 ,I 1 ,T 1 } and S 2 = (V 2 , E 2 ,1 2 , T 2 ) 

be two transition systems. We define the asynchronous product S 1 <8> S 2 as the transition 
system {V 1 UP 2 ,!? 1 U !? 2 , 1 1 A I 2 , T) where: 

• for every e € E 1 \ E 2 , T(e) = T L (e) A frameiV 2 \ P 1 ). 

• for every e £ E 2 \ E 1 , T(e) = T 2 (e) A frameiV 1 \ P 2 ). 

• for every e € E 1 (~l !? 2 , T(e) = T l {e) A T 2 {e). 

where frame{X) stands for /\ xeX x ' = x an< ^ is used to represent the fact that while one 
transition system moves on a local event, the other transition system does not change its 
local state variables. Every state s of S 1 <8> S 2 can be considered as the product s 1 <g> s 2 
such that s 1 = sip is a state of S' 1 and s 2 = s m 2 is a state of S 2 . If either 5 1 or S 2 
is deterministic, also every trace a of S' 1 <8> S 2 can be considered as the product a 1 <8> u 2 
where cr 1 is a trace of S' 1 and a 2 is a trace of S 2 (more in general, the product of two traces 
produces a set of traces due to different possible interleavings). 
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In general, composing two systems can reduce the behaviors of each system and in¬ 
troduce deadlocks. However, given two systems that do not share any state variable (e.g., 
the diagnoser and the plant), if one of the systems is deterministic (the diagnoser) then it 
cannot alter the behavior of the second (the plant). 

Notice that the synchronous product coincides with the asynchronous case when the 
two sets of events coincide. 


2.2. Linear Temporal Logic. We now present a Linear Temporal Logic extended with 
past operators [Pnu771 ILMS021 ILPZ85j . in the following simply referred to as LTL. A 
formula in LTL over variables V and events E is defined as 

P ::= p I e | p A p | ~^p\ Op \ Yf3 \ pSp G3 | F3 \ Xf3 \ pup 

where p is a predicate over F{V ) and e € E. Intuitively, p are the propositions over the 
state of the LTS, while e represents an event. 

Given a trace a = so; eo, si, e-i, s 2 ,..., the semantics of LTL is defined as follows: 

- a,i \= p iS Si ^ p 

- <7, * |= e iff e* = e 

- a, i \= Pi A P 2 iff cr, i |= Pi and cr, i |= @2 

- cr, i j= ->P iff a, i y= P 

- Once: a,i \= OP iff 3j < i. a,j |= P 

- Yesterday: cr ,i \= YP iff i > 0 and a,i — 1 \= P 

- Since: cr,i \= P 1 SP 2 iff there exists j < i such that a,j \= P 2 and for all k, j < k < i, 
cr,k\= Pi 

- Finally: a,i j= FP iff 3j > i. a,j |= P 

- Globally: cr, i \= GP iff Vj > i. a.j \= P 

- Next: cr, i |= XP iff cr, * + 1 |= /3 

- Until: cr, i j= P\UP 2 iff there exists j > i such that cr, j |= p 2 and for all k, i < k < j, 
cr, k |= Pi. 

Given an LTS S = (V, E, I, T ), S |= P iff for every trace cr of S, a, 0 |= p. 

Notice that YP is always false in the initial state, and that we use a reflexive semantics 
for the operators U, F, G, S and O. We use the abbreviations Y n P = YY n ~ 1 p (with 
Y°P = P), Q^ n p = PVYPV • • • V Y n p and F^ n p = /3 V Xpv ■ ■ • V X n p. 


2.3. Partial Observability. A partially observable LTS is an LTS S = ( V ., E, I, T) ex¬ 
tended with a set E a C E of observable events. 

We consider here only observations on events. In practice, observation on states are 
common and relevant. However, dealing with them in the asynchronous setting makes the 
formalism less clear. Therefore, we limit ourselves to observations on events and whenever 
observations on state variables are needed, such as sensor readings, we incorporate them in 
the events as done in SSL + 96j . 

The observable part of the prefix cr k of a trace cr is defined recursively as follows: 
obs(cr° ) = e (empty sequence); if e £ E a , then obs(a k ,e, s) = obs{a k ),e\ if e ^ E a , then 
obs(a k ,e,s ) = obs(a k ). 


Definition 2.4 (Observation Point). We say that i is an observation point for a, denoted 
by ObsPoint(cr,i), iff the last event of a 1 is observable, i.e., iff a 1 = a',e,s for some a',e,s 
and e € E a . 
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The notion of two traces being observationally equivalent requires that the two traces 
end both or neither in an observation point. This captures the idea that a trace ending in an 
observation point can be distinguished from the same trace extended with local unobservable 
steps. In other terms, an observer can distinguish the instant in which it is observing and 
an instant right after. 

Definition 2.5 (Observational Equivalence). We say that ((cti, i), (< 72 , j)) £ ObsEq if and 
only if: 

- ObsPoint(<Ji,i ) iff ObsPoint(a 2 , j), and 

- obs(a\) = obs(a 2 )- 


2.4. Temporal Epistemic Logic. Epistemic logic has been used to describe and reason 
about knowledge of agents and processes. There are several ways of extending epistemic 
logic with temporal operators. We use the logic KL\ [HV89| . extended with past operators. 
A formula in KL\ is defined as 

p ::= P I e | 0 A /3 | -./3| 0/3 | Y(3 \ fiS(3 \ Ff3 \ X(3 \ /3UP j G/3 \ K/3 

KL\ can be seen as extension of LTL with past operators, with the addition of the 
epistemic operator K. The intuitive semantics of Kf3 is that the reasoner knows that (5 
holds in a state of a trace a, by using only the observable information. This means that 
K/3 holds iff (3 holds in all situations that are observationally equivalent. Therefore, while 
in LTL the interpretation of a formula is local to a single trace, in KL\ the semantics of 
the K operator quantifies over the set of indistinguishable traces. Given a trace (J\ of a 
partially observable LTS, the semantics of K is formally defined as: 

cq ,i |= K/3 iff Vo- 2 , Vj. if (( 04 , i), (a 2 , j)) £ ObsEq then a 2 ,j \= f3. 

Kf3 holds at time i in a trace o\ iff /3 holds in all traces that are observationally 
equivalent to a\ up to time i. Note that, due to the asynchronous nature of the observations, 
two traces of different length might lead to the same observable trace. This definition 
implicitly forces perfect-recall in the semantics of the epistemic operator, since we define 
the epistemic equivalence between traces and not between states. 

In many situations, we are interested in considering formulas only at observation points. 
We do so by introducing the following abbreviation. 

Definition 2.6 (Observed). If E 0 is the set of observable events, given a formula </>, we use 
jfj (read “Observed 0”) as abbreviation for (/) A Y\J eeE e. 


2.5. Running Example. The Battery Sensor System (BSS) (Figure[T]) will be our running 
example. The BSS provides a redundant reading of the sensors to a device. Internal batteries 
provide backup in case of failure of the external power supply. The safety of the system 
depends on both of the sensors providing a correct reading. The system can work in three 
different operational modes: Primary , Secondary 1 and Secondary 2. In Primary mode, 
each sensor is powered by the corresponding battery. In the Secondary modes, instead, 
both sensors are powered by the same battery; e.g., during Secondary 1, both Sensor 1 and 
Sensor 2 are powered by Battery 1. The Secondary modes are used to keep the system 
operational in case of faults. However, in the secondary modes, the battery in use will 
discharge faster. 
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Mode Selector 



— ► Power 
— ► Control 
— ► Data 


Figure 1. Running Example (Battery Sensor System) 

We consider two possible recovery actions: i) Switch Mode, or ii) Replace the Battery- 
Sensor Block (the dotted block in Figured]). In order to decide which recovery to apply, we 
are going to define a set of requirements connecting the faults to alarms. The faults and 
observable information of the system are shown in Figured] 

This example is particularly interesting because we can define two sources of delay: 
the batteries, and the device resilience to wrong inputs. The batteries provide a buffer for 
supplying power to the sensors. The size of this buffer is determined by the capacity of the 
battery, the initial charge, and the discharge rate. For the device, we assume that two valid 
sensor readings are required for optimal behavior, however, we can work in degraded mode 
with only one valid reading for a limited amount of time. The device will stop working if 
both sensors are providing invalid readings, or if one sensor has been providing an invalid 
reading for too long. 

Both a synchronous and asynchronous version of this model are possible. In the asyn¬ 
chronous model, we have an event for each possible combination of observations (e.g., “Mode 
Primary Sz Battery 1 Low”). In the synchronous model, we also have an additional observ¬ 
able event {tick) that represents the passing of time in the absence of any observable event. 
This event forces the synchronization of the plant with the diagnoser. The key difference 
between the synchronous and asynchronous setting is the amount of information that we 
can infer in this particular case. For example, if we know the initial charge level of a battery, 
and we know its discharge rate (given by the operational mode), then at each point in time 
we can infer the current charge of the battery. By comparing our expectation with the 
available information, we can detect when something is not behaving as expected. Unfortu¬ 
nately, there are practical settings in which the assumption of synchronicity is not realistic. 
Therefore, our approach accounts for both the synchronous and asynchronous models. 


Observables 

Possible Values 

Mode 

Battery Level {1,2} 
Sensors Delta 

Device Status 

Primary, Secondary 1, Secondary 2 

High, Mid, Low 

Zero, Non-Zero (\Sl.Out — S2,Out\ = 0) 
On, Off 


Component 

Faults 

Generator 

Battery 

Sensor 

Off (Gl 0 ff, G2 0 ff) 

Leak (-Sl-Leafc} ^^Leak) 

Wrong Output (Slwo, S%wo) 


Figure 2. Observables and Faults Summary 





















M. BOZZANO, A. CIMATTI, M. GARIO, AND S. TONETTA 


To provide a better understanding of how the running example behaves, we provide 
the LTS of each of the components. Figure [3] shows the LTS of the generator and switch. 
We assume that the only way the generator can turn off is if a fault event occurs, thus the 
model of the generator is rather simple. Also the switch features a rather simple model, 
where the labels toSl and toS2 are defined as: 

• toSl: Mode=Secondaryl A Batteryl.Double A Battery2.Offline 

• toSl: Mode=Secondary2 A Batteryl.Offline A Batteryl.Double 
thus they drive the change in operational mode of the batteries. 



Figure 3. Generator (Left) and Switch (Right) LTS 


Figure [I] shows two slightly more complex components: the sensor and the device. 
The sensor periodically outputs a good or a bad reading depending on the state it is in. 
Notice that the transition from a good to a bad state can occur either because of a fault 
(Wrong Output in Figure [2]) or because the battery connected to the sensor has no charge 
( Batt.c = 0), notice, in particular, that both events are not observable. The device instead 
has two main transitions. The stay is defined as Sl.Value = S2.Value A Delta = Zero , 
while degrade represents a discrepancy in the reading from the sensor that will eventually 
lead to the device stopping: ( Sl.Value ^ S2.Value ) A Delta =Non-Zero. The values of 
the sensors are not observable, but their difference is observable via the Delta variable. 
Intuitively, the device has an intermediate state that works as a buffer, before reaching the 
final Off state. 




Figure 4. Sensor (Left) and Device (Right) LTS 

The most complex component, the battery, is presented in Figured Vertical transitions 
indicate a change in operational mode of the battery. The left half of the LTS indicates 
that the generator is working and feeding the battery (thus charging it) while the right half 
shows that the battery is not charging. Additionally, the two central columns describe the 
faulty behavior of the battery. This information is represented also in each state. Each 
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state has an additional self-loop (not in the picture) denoting the update of the charge of 
the battery, following the update rule: 

charge' = (charge + recharge — (load + leak)) mod C 

where C is the capacity, and the other variables depend on the state: 

(1) Charging: recharge = 1, Not Charging: recharge = 0 

(2) Primary: load = 1, Offline: load = 0, Double: load = 2 

(3) Nominal: leak = 0, Faulty: leak = 2 

Thus the charge of the battery can change from +1 (Nominal, Offline, Charging) to —4 
(Faulty, Double, Not Charging), while staying within the bound [0 ,Capacity). 

Every time the update of the charge causes the charge to pass a threshold, the transition 
raises the observable event: Low , Mid, High. These events indicate when the charge of 
the battery is above 20%, 50% and 80%. All other transitions are not observable. These 
transitions have been omitted from the figure to make it more readable. 



3. Formal Characterization 

3.1. Diagnoser. In our general setting, a plant is connected to components for Fault De¬ 
tection and Isolation, and for Fault Recovery, as depicted in Figure (6j The role of FDI is 
to collect and analyze the observable information from the plant, and to turn on suitable 
alarms associated with (typically unobservable) relevant conditions. The Fault Recovery 
component is intended to apply suitable reconfiguration actions based on the alarms in 
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input. Recovery is beyond the scope of this work; we consider a system composed of the 
plant and the FDI component. 

An FDI component (also called diagnoser in the following) is a machine D that syn¬ 
chronizes with observable traces of the plant P. D has a set A of alarms that are activated 
in response to the monitoring of P. Different mechanisms to connect a diagnoser to a plant 
are possible. In the synchronous case, the plant is assumed to convey to the diagnoser in¬ 
formation at a fixed rate (including state sampling and values for event ports). This model 
is adopted, for example, in |BCGT14t ICPC03] . In this paper we focus on the more general 
model of asynchronous case, where the diagnoser reacts to the observable events in the plant 
0 . 



Figure 6. Integration of the FDIR and Plant 

Definition 3.1 (Diagnoser). Given a set A of alarms and a partially observable plant 
P = (V P ,E P , I p , T p , E p ), a diagnoser is a deterministic LTS D(A, P) = (V D , E D ,I D , T D ) 
such that E p = E D , V p n V D = 0 and A C V D . 

When clear from the context, we use D to indicate D(A,P). We assume that the events 
of the diagnoser coincide with the observable events of the plant. This means that the 
diagnoser does not have internal transitions: every transition of the diagnoser is associated 
with an observable transition of the plant. We say that the alarm A is triggered when A is 
true after the diagnoser synchronized with the plant (i.e., when l Aj is true). 

Since the synchronous case is a particular case of the asynchronous composition, in the 
rest of the paper we assume that the plant and diagnoser are composed asynchronously: 
i.e., D ® P. Only observable events are used to perform synchronization. 

The choice of using a deterministic diagnoser is driven by the following result, that 
makes it easier to understand how the diagnoser will react to the plant: 

Definition 3.2 (Diagnoser Matching trace). Given a diagnoser D of P and a trace ap of P, 
the diagnoser trace matching ap, denoted by D(ap), is the trace a of D such that a Cg) crp 
is a trace of D ® P. 

Note that the notion of diagnoser matching trace is well defined because, since D is 
deterministic, there exists one and only one trace in D matching ap. 


Mhe relation between the synchronous and the asynchronous combination is discussed in Section 18.11 
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3.2. Detection, Identification, and Diagnosis Conditions. The first element for the 
specification of the FDI requirements is given by the conditions that must be monitored. 
Here, we distinguish between detection and identification, which are the two extreme cases 
of the diagnosis problem; the first deals with knowing whether a fault occurred in the system, 
while the second tries to identify the characteristics of the fault. Between these two cases 
there can be intermediate ones: we might want to restrict the detection to a particular 
sub-system, or identification among two similar faults might not be of interest. 

The detection task is the problem of understanding when (at least) one of the compo¬ 
nents has failed. The identification task tries to understand exactly which fault occurred. 

In the BSS every component can fail. Therefore the detection problem boils down to 
knowing that at least one of the generators, batteries or sensors is experiencing a fault. For 
identification, instead, we are interested in knowing whether a specific fault, (e.g., G\off) 
occurred. There are also intermediate situations (sometimes called isolation ), in which we 
are not interested in distinguishing whether Glojf or B\L ea k occurred, as long as we know 
that there is a problem in the power-supply chain. 

FDI components are generally used to recognize faults. However, there is no reason to 
restrict our interest to faults. Recovery procedures might differ depending on the current 
state of the plant, therefore, it might be important to consider other unobservable informa¬ 
tion of the system. For example, we might want to estimate the charge level of a battery, 
or its discharge rate. 

We call the condition of the plant to be monitored diagnosis condition , denoted by f3. 
We assume that for any point in time along a trace execution of the plant (and therefore 
also of the system), f3 is either true or false based on what happened before that time 
point. Therefore, fi can be an atomic condition (including faults), a sequence of atomic 
conditions, or Boolean combination thereof. If j3 is a fault, the fault must be identified; if 
(3 is a disjunction of faults, instead, it suffices to perform the detection, without identifying 
the exact fault. 


Diagnosis condition 

Definition 

Generator 1? ftGenerator2 
Batteryl •> ftBattery2 
PpSUU PpSU2 
ftBatteries 
fiSensorh fiSensor2 
^Sensors 

Pbs 

fiSeq 

ftCharging 
ftDepleted 

Gl 0 ff, G2 0 ff 

B l_Leafc, B2Lecik 

Gloff V BlLeak, G2off V B2Leak 

B^Leak V B2i /eak 

Slwo-, S2wo 

Slwo V S2wo 

(Slwo V S2 WO ) v (BiLeak A B2 Leak ) 

( B\ch ar ge B2cf lar gfi) A Oi y B\(J} lar g e 7 B2(jy lar gfi) 

Y ((Blcharge < 0 ) A Y (Blcharge > 0) 

(• Blcharqe = 0) V ( B2 C harqe = 0) 


Figure 7. Diagnosis conditions for the BSS 

Figure [7] shows several examples of diagnosis conditions for the BSS. Notice how we 
might be in complex situations such as knowing if the Battery-Sensor block is working (/ 3bs ) 
or knowing some information on the evolution of the system (fiseq, PCharging )• We use LTL 
operators to define those diagnosis conditions, but in general, we require that a diagnosis 
condition can be evaluated on a point in a trace by only looking at the trace prefix. 
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i n n n n 

ExactDel(A,^,2) 

n n n ru 

BoundDel(A, f3, 4) 

p 

FiniteDel(A, (3) 

n ru 


Figure 8 . Examples of alarm responses to the diagnosis condition j3. 

3.3. Alarm Conditions. The second element of the specification of FDI requirements is 
the relation between a diagnosis condition and the raising of an alarm. This also leads to 
the definition of when the FDI is correct and complete with regard to a set of alarms. 

An alarm condition is composed of two parts: the diagnosis condition and the delay. 
The delay relates the time between the occurrence of the diagnosis condition and the cor¬ 
responding alarm. Although it might be acceptable that the occurrence of a fault can go 
undetected for a certain amount of time, it is important to specify clearly how long this 
interval can be. An alarm condition is a property of the system composed by the plant and 
the diagnoser, since it relates a condition of the plant with an alarm of the diagnoser. Thus, 
when we say that a diagnoser D of P satisfies an alarm condition, we mean that the traces 
of the system D <8> P satisfy it. 

Interaction with industrial experts led us to identify three patterns of alarm condi¬ 
tions, which we denote by ExactDel(A, (3, d), BoundDel(A, f3, d), and FiniteDel(A, (3): 

1. ExactDel(A, j3, d) specifies that whenever (3 is true, A must be triggered exactly 
d steps later and A can be triggered only if d steps earlier (3 was true; formally, for any 
trace a of the system, if (3 is true along a at the time point i, then L A a is true in a[i + d] 
(Completeness); if L A, is true in a[i], then (3 must be true in a[i — d] (Correctness). 

2. BoundDel(A, (3, d) specifies that whenever j3 is true, A must be triggered within 
the next d steps and A can be triggered only if f3 was true within the previous d steps; 
formally, for any trace a of the system, if (3 is true along a at the time point i then L . A_, is 
true in for some i < j < i + d (Completeness); if l Aj is true in a [i], then (3 must be 
true in cr[j'] for some i — d < j' < i (Correctness). 

3. FiniteDel(A, (3) specifies that whenever (3 is true, A must be triggered in a later 
step and A can be triggered only if /3 was true in some previous step; formally, for any 
trace a of the system, if (3 is true along a at the time point i then l Aj is true in a[j] for 
some j > i (Completeness); if L A_, is true in <j[i], then (3 must be true along a in some time 
point between 0 and i (Correctness). 

Figure [8] provides an example of admissible responses for the various alarms to the 
occurrences of the same diagnosis condition f3] note how in the case of BoundDel(A, f3, 4) 
the alarm can be triggered at any point as long as it is within the next 4 time-steps. Since A 
is a state variable and the diagnoser changes it only in response to synchronizations with the 
plant, every rising and falling edge of the alarm in the figure corresponds to an observation 
point. 

Figure [9] contains a simple specification for our running example. There are two types 
of PSU (Power Supply Unit) alarms (that can be similarly defined for PSU 2). The first one 
defines multiple alarms, each having a different delay i. Let us assume that each battery 
has a capacity C of 10, and that this provides us with a delay of at most 10 time-units. 
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Pattern 

Description 

EXACTDEL (PSU 1 ExacU , PpSUI , i) 

BOUNDDEL(P SU ^-Boundi PpSU 1 ,C) 

BoundDel(H5, /3bs, DC) 

FiniteDel (Discharged, /3 Dep i et ed ) 

Detect if the PSU 1 (Generator 1 + Battery 1) 
is broken, in order to switch to secondary mode 
Detect if the PSU (Generator 1 + Battery 1) was 
broken within the bound, in order to switch to 
secondary mode 

Detect if the whole Battery-Sensor block is work¬ 
ing incorrectly, in order to replace it 

Detect if any of the battery was ever completely 
discharged 


Figure 9. Example Specification for the BSS 

We can instantiate 10 alarms one for each i € [0,10]. Ideally, we want to detect the exact 
moment in which the PSU stop working. However, this might not be possible due to non- 
diagnosability. Therefore, we define a weaker version of the alarm ( PSUlBound ), in which 
we say that within the time-bound provided by the battery capacity (C) we want to know 
if the PSU stop working. In Section 15.11 we will prove that one alarm condition is weaker 
than the other. For most alarms, we specify what recovery can be applied to address the 
problem. In this way, our process of defining the alarms of interest is driven by the recovery 
procedures available. If there is no automated recovery for a given situation, time-bounds 
might not be relevant anymore. Therefore, we use alarms to collect information on the 
historical state of the system (e.g., Discharged alarm); notice, in fact, that FiniteDel 
alarm have a permanent behavior, i.e., they can never be turned off. 

3.4. Diagnosability. Given an alarm condition, we need to know whether it is possible to 
build a diagnoser for it. In fact, there is no reason in having a specification that cannot be 
realized. This property is called diagnosability and was introduced in jSSL + 95] . 

In this section, we define the concept of diagnosability for the different types of alarm 
conditions. We proceed by first giving the definition of diagnosability in the traditional way 
(a la Sampath) in terms of observationally equivalent traces w.r.t. the diagnosis condition. 
Then, we prove that a plant P is diagnosable iff there exists a diagnoser that satisfies the 
specification. 

Definition 3.3. Given a plant P and a diagnosis condition j3, we say that 

ExactDel(H, /3, d) is diagnosable in P iff for all a±,i s.t. <7 \,i \= /? then ObsPoint(a\, i + d) 
and for all 02 ,j, if ObsEq((ai,i + d), ( 02 , j + d)), then 02 , j |= /3. 

Therefore, an exact-delay alarm condition is not diagnosable in P iff either there is no 
synchronization after d steps (note that this is not possible in the synchronous case) or 
there exists a pair of traces o\ and 02 such that for some i, j > 0, oq, i \= /3 , ObsEq((ai,i + 
d), (<72, j + d)), and 02 , j f3. We call such a pair a critical pair. 

Definition 3.4. Given a plant P and a diagnosis condition (3, we say that 

BoundDel(H, /3, d) is diagnosable in P iff forall a\,i s.t. cri,i \= j3 there exists k s.t. 

i < k < i + d, ObsPoint(a\ , k) and for all 02 , l, if ObsEq((a 1 , k ), ( 02 , /)), then there exists j 

s.t. I — d < j < l and 02 , j \= j3. 
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Intuitively, k,l denote points that are observationally equivalent and i, j denote the 
states where the condition occurred, and their relation is such that i and j do not occur 
more than d steps away from each other. 

This definition takes into account occurrences of f3 that happened before i. Indeed, 
we need to check occurrences up to d states before and after i. Consider the two traces 
(ji = apbqc and a 2 = aqbpc, where a, b, c are observable events, and (3 = p. We can see that 
we can justify p in o\ by looking at the occurrence of p in <72 that is in the future. However, 
we cannot justify the p in oq by just looking in the future, but we need to look in the past. 

Definition 3.5. Given a plant P and a diagnosis condition /?, we say that FiniteDel(A, j3) 
is diagnosable in P iff for all o\,i s.t. a\,i |= (3 then there exist k > i s.t. ObsPoint(a\, k ) 
and for all < 72 , Z if ObsEq((a\, k), ( 02 , Z)) then there exists j < l <72, j \= /3. 

Definition 13.41 is a generalization of Sampath’s definition of diagnosability: 

Definition 3.6. (Diagnosability [SSL~*~95] ) Given a plant P and a diagnosis condition 3, 
we say that (3 is diagnosable in P iff there exists d s.t. for all a,i, < 72 , Z, k > i + d if oq, i \= f3 
and obs(cr l 2 ) = obs(a\) then there exists j < l s.t. 172 , j f= /3. 

In jSSL + 95] (specifically in Section II.A), Sampath et al. also assume that there are no 
cycles of unobservable events. This means that there is a d u s.t. for all < 7 , i s.t. < 7 , i \ = f3 then 
there exists k s.t. 0 < k < d u and ObsPoint(cr,i + k ). 

Theorem 3.7. Let P be a plant such that there is no cycle of unobservable events, and let 
p be a propositional formula, then p is diagnosable (as defined in \3. til) in P iff there exists 
d such that BoundDel(A, Op, d) is diagnosable in P. 

Proof. 

=>) Assume that p is diagnosable in P. Consider a trace oq such that for some i > 0, 
<7j, 'i j= Op. Then, for some 0 < i' < i, <ji, i' |= p. By assumption, we know that there 
is a d s.t. for all k > i' + d and any trace <72 and point l such that obs(a l 2 ) = obs(<7i) 
then (72O' |= P f° r some j ', j 1 < l. Then < 72 , j |= Op for all j > j'. Since this holds for 
any k and Z, it holds also for the k and l that are observation points for ci\ and < 72 . Let 
d' = d + n u . Then there exists k' < d! such that ObsPoint{a\,i + k') and for all trace 
(72 and point l such that ObsEq{{a\, k r ), (< 72 , Z)) then (72, j' \= p for some j ', j' < l. We 
can conclude that BoundDel(A, Op, d') is diagnosable in P. 

4=) Assume that BoundDel(A, Op, d) is diagnosable in P. Consider a trace oq such 
that for some i > 0 <7i,i |= p. Then o \, i |= Op. By assumption, there exists k, 
i < k < i + d such that ObsPoint{a\,k ) and, for any trace <72 and point l such that 
ObsEq((ai, k), ((72,1)) then (72, j \= Op for some l — d < j < l. Let us consider 172 
and l such that obs(a l 2 ) = obs(a\). Then for some l' < l we have that ObsPoint(c72,l') 
and therefore ObsEq((a\,k), (< 72 , l')). Then ( 72 , j |= Op for some l — d < j < l. Thus 
a 2 , j' 1= P for some j' < j and P is diagnosable. □ 

The following theorem shows that if a component satisfies the diagnoser specification then 
the monitored plant must be diagnosable for that specification. In Section [6] on synthesis we 
will show also the converse, i.e., if the specification is diagnosable then a diagnoser exists. 

Theorem 3.8. Let D be a diagnoser for P. If D satisfies an alarm condition then the 
alarm condition is diagnosable in P. 
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Proof. By contradiction, suppose ExactDel(A, /5, d) is not diagnosable in P. Then either 
there exists a trace a\ with oq,z \= j3 for some i such that ObsPoint{a\, j) is false for all 
j > i or there exists a critical pair. In the first case, A is not triggered and the diagnoser 
is not complete. Suppose there exists a critical pair of traces a\ and 02 , i.e., for some 
i,j> 0 (T\,i \= f3, ObsPoint{a\,i + d), ObsEq{{a\,i + d), ( 02 , j + d)), and 02 , j \f= (3. Since 
D is deterministic, D(a±) and D^af) have a common prefix compatible with obs(af^ d ) = 
obs(a J 2 +d ). If the diagnoser is complete then A is triggered in D(a 1 ) <8> or at position i + d, 
and so also in D(a 2 ) <8> 02 at position j + d, but in this way the diagnoser is not correct, 
which is a contradiction. If the diagnoser is correct, then A is not triggered in D{o 2 ) o "2 at 

position j + d, but so neither in D{u\) <8> o\ at position i + d, but in this way the diagnoser 
is not complete, which is a contradiction. 

Similarly, for FiniteDel(A, j3) and BoundDel(A, f3, d). □ 

The definition above of diagnosability might be stronger than necessary, since diagnosability 
is defined as a global property of the plant. Imagine the situation in which there is a critical 
pair and after removing this critical pair from the possible executions of the system, our 
system becomes diagnosable. This suggests that the system was “almost” diagnosable, and 
an ideal diagnoser would be able to perform a correct diagnosis in all the cases except 
one (i.e., the one represented by the critical pair). To capture this idea, we redefine the 
problem of diagnosability from a global property expressed on the plant, to a local property 
expressed on points of single traces. 

Definition 3.9. Given a plant P, a diagnosis condition f3 and a trace such that for 
some i > 0 a\ , i [= (3, we say that ExactDel(A, (3, d) is trace diagnosable in (a\,i) iff 
ObsPoint ( 01 , i + d) and for any trace 02 , for all j > 0 such that ObsEq(fcr\,i+d), (a 2 ,j+d)), 

°2,j |= 13. 

Definition 3.10. Given a plant P, a diagnosis condition (3 , and a trace o\ such that for 
some i > 0 or, i |= /3, we say that BoundDel(A, (3, d) is trace diagnosable in ( 01 , i) iff there 
exists k s.t. i < k < i + d, ObsPoint(a \, k), and for any 02 > l if ObsEq((a \, k), ( 0 ^ 2 , /)), then 
there exists j s.t. I — d < k < l and a 2 , j \= /3. 

Definition 3.11. Given a plant P, a diagnosis condition f3, and a trace o\ such that for 
some i > 0, a\,i |= (3, we say that FiniteDel(^4, (3) is trace diagnosable in (<Ji,i) iff there 
exists k > i s.t. ObsPoint(a\,k ) and for all 02 ,1 if ObsEq((ai,k), (<j 2 ,l)), then there exists 
j < l and 0- 2 , j (= P- 

A specification that is trace diagnosable in a plant along all points of all traces is 
diagnosable in the classical sense, and we say it is system diagnosable. The concept of trace 
diagnosability does not impose any specific behavior to the diagnoser. However, it is an 
important concept that allows us to better characterize and understand the specification 
and the system. 

3.5. Maximality. As shown in Figure 0 bounded- and finite-delay alarms are correct if 
they are raised within the valid bound. However, there are several possible variations of 
the same alarm in which the alarm is active in different instants or for different periods. 
We address this problem by introducing the concept of maximality. Intuitively, a maximal 
diagnoser is required to raise the alarms as soon as possible and as long as possible (without 
violating the correctness condition). 
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Definition 3.12. D is a maximal diagnoser for an alarm condition with alarm A in P iff for 
every trace dp of P, D(ap) contains the maximum number of observable points i such that 
D(ap),i \= A; that is, if D(ap),i \fi= A, then there does not exist another correct diagnoser 
D' of P such that D'(crp),i \= A. 


4. Formal Specification 

In this section, we present the Alarm Specification Language with Epistemic operators 
(ASL/v - )- This language allows designers to define requirements on the FDI alarms including 
aspects such as delays, diagnosability and maximality. 

Diagnosis conditions and alarm conditions are formalized using LTL with past operators. 
The definitions of trace diagnosability and maximality, however, cannot be captured by 
using a formalization based on LTL. To capture these two concepts, we rely on temporal 
epistemic logic. The intuition is that this logic enables us to reason on set of observationally 
equivalent traces instead that on single traces (like in LTL). We show how this logic can 
be used to specify diagnosability, define requirements for non-diagnosable cases and express 
the concept of maximality. 

4.1. Diagnosis and Alarm Conditions as LTL Properties. Let V be a set of proposi¬ 
tions representing either faults, events or elementary conditions for the diagnosis. The set 
Vp of diagnosis conditions over V is any formula P built with the following rule: 

P ::= p | P A P | -.0 | Of5 | Y/3 

with p € V. 

We provide the LTL characterization of the Alarm Specification Language (ASL) in 
Figure [TOl On the left column we provide the name of the alarm condition (as defined in 
the previous section), and on the right column we provide the associated LTL formalization 
encoding the concepts of correctness and completeness. Correctness , the first conjunct, 
intuitively says that whenever the diagnoser raises an alarm, then the fault must have 
occurred. Completeness , the second conjunct, intuitively encodes that whenever the fault 
occurs, the alarm will be raised. In the following, for simplicity, we abuse notation and 
indicate with ip both the alarm condition and the associated LTL; for an alarm condition 
(p, we denote by A v the associated alarm variable A, and with r(cp) the following formulas: 

t(<p) = Y d /3 for ip = ExACTDEL(A,/3,d); 
r(p) = 0- d /3 for (p = BoundDel(A, P,d); 
r(<p) = Ofi for ip = FiniteDel(A, ft). 

When clear from the context, we use just A and r instead of A v and r(ip), respectively. 


Alarm Condition 

LTL Formulation 

ExactDel(A, d) 


GGA -A Y d p) 

A G(P -A X d ^Afi 

BoundDel(A, j3,d) 


g( l 4, o^ d p) 

VI 

t 

eT 

< 

FiniteDel(A, fi ) 


GfiA, -a op) r 

G(P -A F l 4) 


Figure 10. Alarm conditions as LTL (ASL): Correctness and Completeness 
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Alarm Condition 

Diagnosability 

Maximality 

ExactDel(A, / 3 , d ) 

G(/3 -A X d JCY%) 

G{JCY A h —i l AJ 


BoundDel(A, /?, d ) 

G(f ->• F- d JiO- d f3f) 

GfKO- d [3^ -> l A, 

| 

FiniteDel(A, j 3 ) 

G(/3 -A FJCO&) 

GfKOfdj -a l Aj) 



Figure 11. 


Diagnosability and Maximality 


a 

KO - a /3 
A (Maximal) 
A (Non-Maximal) 


n_i 


Figure 12. Example of Maximal and Non-Maximal traces 


4.2. Diagnosability as Epistemic Property. We can write the diagnosability test for 
the different alarm conditions directly as epistemic properties. The general formulation is 
presented on the left column of Figure [TT1 In order to test for system diagnosability, we 
will check whether the formula holds for all traces of the system; while to check for trace 
diagnosability we will check whether the formula holds for single points in a trace. For 
example, the diagnosability test for ExactDel(A, (3, d) says that it is always the case that 
whenever (3 occurs, exactly d steps afterwards, the diagnoser knows [3 occurred d steps 
earlier. Since K is defined on observationally equivalent traces, the only way to falsify 
the formula would be to have a trace in which /3 occurs, and another one (observationally 
equivalent at least for the next d steps) in which f3 did not occur; but this is in contradiction 
with the definition of diagnosability (Definition 13.31 ). 

4.3. Maximality as Epistemic Property. The property of maximality says that the 

diagnoser will raise the alarm as soon as it is possible to know the diagnosis condition, and 
the alarm will stay up as long as possible. The property L Ktj -a encodes this behavior: 

Theorem 4.1. D is maximal for <p in P iff D ® P (= G(^Ktj —> L A_,). 

Proof. =>) Suppose D is maximal and by contradiction D <S> P G(^Ktj -a ^Af). Thus, 
there exists a trace crp of P and i > 0 such that D(ap) x crp,i |= ( L /ir_, A -> L A_,) (where 
D{(jp ) is the diagnoser trace matching crp as defined in Definition 13.21) . By Definition 12.61 
of i is an observation point. Let i be the j-th observation point of crp. Consider D' 
obtained by D(a p ) converting the trace into a transition system using a sink state so that 
D' is deterministic and setting L A_, to true only in the state D(ap)[j] (thus triggering A in j 
and setting it to false at the next observation point). For every trace o' p of P matching with 
D'(op), obs(o'p) = obs(ap), and thus cr' p ,i |= t (since D{op) x op,i |= (JPrf). Therefore 
D' |= G( l Aj —>• r) contradicting the hypothesis. 

<=) Suppose D ® P |= G(,_Ktj l Aj) and by contradiction D is not maximal for 
ip in P. Then there exists a trace crp of P such that D(op),i L A_, and there exists 
another diagnoser D' of P such that D'(op),i |= L A_, and D' (g> P |= G( l Aj — > r). Then, 
for some j, D{op) ® crp,j \f= l A j; D'(op) <8) crp, j |= l Aj, and so D{op) (g> crp, j i_Kt a and 
crp, j |= t. Then there exists another trace o' P of P and f such that ObsEq((a p ,j'), (crp, j)) 
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and o' P ,j' \f= r. Since D' is deterministic, D'(a' p ) and D\ap) are equal up to position i, 
and so £>' <g> P ^ G( l 4j —>• r) contradicting the hypothesis. □ 

Whenever the diagnoser knows that r is satisfied, it will raise the alarm. An example of 
maximal and non-maximal alarm is given in Figure fl2l Note that according to our definition, 
the set of maximal alarms is a subset of the non-maximal ones. 

A property related to Maximality is the capability of the diagnoser to justify the raising 
of the alarm. This property is guaranteed by construction by any correct diagnoser, as shown 
in the following theorem. 

Theorem 4.2. Given a diagnoser D and a plant P, for each alarm A of D, with temporal 
condition t, if D is correct for A it holds that: 

D®P |= G( L A, -a JCrf) 

Thus, whenever the diagnoser raises an alarm, it knows that the diagnosis condition has 
occurred. 

Proof. We assume by contradiction that the G{^A_, —> JCrf) is not satisfied. Therefore, 
there exist a and i such that D(a) <S> c, i |= L A_, A (where D(ap) is the diagnoser trace 

matching ap as defined in Definition l3.2H . which is equivalent to ^A_,/\^Kt (by Definition [276] 
°f u). Thus, a.i \= t by correctness of D. In order for the -i Kt to hold, we need another 
trace a' and j s.t. ObsEq((a,i), (a' , j )) and a',j |= -it. By definition, the diagnoser is 
deterministic, thus we know that for a, a' at points i,j we will have the same value of A. 
Therefore, D(a')<S>cr' ,j \= L A_, A-it so that D is not correct, thus reaching a contradiction. □ 


4.4. ASLx Specifications. The formalization of ASL^ 1 Figure fT3l) is obtained by extend¬ 
ing ASL iFigure fTOl) with the concepts of maximality and diagnosability, defined as epistemic 
properties. When maximality is required we add a third conjunct following Theorem 14.11 
When Diag = Trace instead, we precondition the completeness to the trace diagnosability 
(as defined in Figure fTT]) : this means that the diagnoser will raise an alarm whenever the 
diagnosis condition is satisfied and the diagnoser is able to know it. 

Several simplifications are possible. For example, in the case Diag = Trace , we do not 
need to verify the completeness due to the following result: 

Theorem 4.3. Given a diagnoser D for a plant P and a trace diagnosable alarm condition 
p, if D is maximal for p, then D is complete. 

Proof. (ExactDel) For all a, i if cr, i |= ( f 3 —>■ X d JvY d ( 3 f ), then by using the maximality 
assumption, we know that a,i |= (/3 A' d L A_,); thus, a, i |= (j3 —» X d \ KY d fd _ f ) — > (/3 —> 

X d l _ Af ). Similarly we can prove BoundDel and FiniteDel. □ 

As a corollary of Theorem l4.3L the same can be applied also for system diagnosable alarm 
conditions if P is diagnosable, since system diagnosability implies trace diagnosability: 

Theorem 4.4. Given an alarm condition for the system diagnosable case, and a diagnoser 
D for a plant P, if D is maximal for p and p is diagnosable in P then D is complete. 

Proof. The theorem follows directly from Theorem 14.31 and the fact that if D is complete 
for a trace diagnosable alarm condition that is system diagnosable, then D is also complete 
for the corresponding system diagnosable alarm condition. □ 
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Template 

Maximality = False 


Maximality = True 

e 

ExactDel 

G( l A -4 Y d /3) A G(J3 -4 X d l _A J ) 


G( l Aj -4 F d ,0) A G(/3 -4 X d L Aj) A 

SO 

CO 



G{JCY d p_, -4 l aJ 

CO 

II 

BoundDel 

G( l A -4 0^ d /3) A G(/3 -4 F^ d JO 


G( l Aj -4 0^ d /3) A G(/3 -4 E^A,) A 

e 



G(JCO- d Pj -4 l AJ 

Q 

FiniteDel 

G( l A -4 0/3) A G(/3 -4 E L A,) 


G( l A j -4 0/3) A G(/3 -4 E L A,) A 




G( L KOf3j — > l Aj) 


ExactDel 

G( l 4j -4 Y d f3) A 


G( l Aj — > y d j8) A 

o 

e 

G( 08 -4 X d JiY d g j ) -4 08 M.A)) 


G( (/3 -I- X d JiY d ji J ) -4 08 -4 X^AJ) A 

E-h 

II 




G0EF d /3, -4 l aJ 

e 

BoundDel 

G( l Aj -4 0- d f3) A 


GOA, -4 0^/ 3) A 

Q 

G( (/3 -4 F- d L KO- d /3j) -4 09 -4 E^ d L A,)) 


G( (/3 -4 F- d L KO- d /3j) -4 (/3 -4 E- d L Aj)) A 





G(^KO- d t 3j -4 L A,)[ 


FiniteDel 

G( l 4j -4 0/3) A 


GOA, -4 Op) A 


G( 08 -4 E L E 0/3,) -4 (/3 -4 F l Aj)) 


G( 03 -4 E L EO/3,) -4 (£ -4 E L A,)) A 





G(lEO/3_, —> L A,) 


Figure 13. ASLj^ specification patterns among the four dimensions: 
Diagnosability , Maximality , Completeness and Correctness . 


This Theorem is interesting because it tells us that if a specification that was required to 
be system diagnosable is indeed system diagnosable, then we can just check whether the 
diagnoser is maximal and avoid performing the completeness test. 



Template 


Maximality = 

False 

Maximality = 

True 

g 

ExactDel 


GOA, -4 F d /3) 

\ G(/3 -4 X d L A,) 

G( l A ->> T d /3) / 

\ GO 3 -4 A d L A,) 

co 

S* 

co 



G( L AT d A -4 A) 


BoundDel 


GOA -4 0^ d /3) A G(/3 -4 F^ d L A,) 

GOA, -4 O^ d 0) 

A G(/3 -4 F^ d L A,) A 

Diag 



G( L EO- c! /3 J -4 A)| 

FiniteDel 


G( l A ->• 0/3) A GG8 -A F L A,) 

G( l A, -4 0/3) A G(/3 -4 F L A,) A 




G( l A'0/3, -4 A) 


Diag = Trace 

ExactDel 


G( l A ->• T d /3) 
G{ L _KY d /3 J -4 A 

A 

1 

G( l A, -4 F d /3) 
G( L EF d /3 J -4 A) 

\ 

BoundDel 


G( l Aj -4 0- d /3) 

A 

GOA -t 0^ d /3) 

A 

G((/3 A F- d L AT 

i>^ d /3, ) -4 F^ d L A,) 

GiKO^ d /3, -4 A 

0 

FiniteDel 


G( l A 0/3) A 


GOA, -4 0/3) A 



G((/3 A F L AO/3, ) -4 F L A,) 

G( l EO/3, -4 A) 


Figure 14. ASL/^ with simplified patterns for Diag = Trace 
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Theorem 4.5. For all trace diagnosable and non-maximal ExactDel specifications, com¬ 
pleteness can be replaced by maximality. Formally, for all a, a \= G((/3 —> X d JTY d fif) —> 
(/3 -A X d ^)) iff a ^ G(J<Y d /5_ l -a L AJ 

Proof. 


<r,i HW -»• X d JCY%) -A 09 -a X^Af)) 

iff 

a, i K03 A X d J<Y d fif) -A X d L _A J ) 

iff 

a,i + d \=((Y d /3 A JTY d fif) -A l AJ 

iff 

a,i + d \=((JF d fi A KY d fif) -A l AJ 

iff 

a,i + d |=( l A Y d (3j -A L Aj) 



Therefore, we can conclude that for all i, a,i \= {{(5 —>• X d u KY d /3f) —>■ (/3 —> X d j\.f)) iff for 
all j > d, a,j \= ( L /\y d /3_, -A- l Aj). We conclude noting that for j < d, Y d f3 is false and 
therefore a,j \= ( J\Y d (3_, -a lA,)- □ 

After applying the simplifications specified in Theorem 14.31 and Theorem 14.51 and the 
equivalence jfi -a jfj = -A if, we obtain the table in Figure HU where the patterns in 
the lower half (Diag = Trace ) have been simplified. 

An ASLjf specification is built by instantiating the patterns defined in Figure fl3l For 
example, we would write ExactDel k(A, (3, d, Trace, True) for an exact-delay alarm A 
for ft with delay d, that satisfies the trace diagnosability property and is maximal. An 
introductory example on the usage of ASL/^ for the specification of a diagnoser is provided 
in [BCGT13] , Figure [15] shows how we extend the specification for the BSS by introducing 
requirements on the diagnosability and maximality of alarms. In particular, all the alarms 
that we defined are not system diagnosable. Therefore, we need to weaken the requirements 
and make them trace-diagnosable. The patterns are then converted into temporal epistemic 
formulae as shown in Figure [Tfil 


ExactDel k(PSU Iexocu > Ppsui ,i,T race , T rue) 
BoVNDDEL K (PSUl B ound, fipsui, C, Trace, True) 
BoundDel^(I35', Abs, DG, Trace, True) 
FiniteDeLx (Discharged, ^Depleted, Trace, False) 
FiniteDeLx (BILeak, fiBatteryi , System, True) 


Figure 15. ASL/^ Specification for the BSS 


Alarm 

Formula 

PSUlExacti 

GfPSUl ExactiJ -A X0p SU i) A GfKY'ppsuu -A PSUl ExactiJ ) 

PSUlBaund, 

GfiPSUlsoundj A- 0~ C Ppsui) A G( L KO~ C fipsuij -A iPSUlBaundf) 

BS 

G{J3S_, -A 0- DC Pbs) a G( l KO- dc /3 B Sj -a 

Discharged 

G(JDischarged J -» 0(3 Dep i ated ) A G{(l3 Dep i ated A FJCOf} Dep i atedj ) -A FJDischargedf) 

BILeak 

GiBILeak, ^ G^Batteryl') A Battery! ^ A Hr /^Battery Li ^ 


Figure 16. KL\ translation of ASL/^ patterns for the BSS 
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In the BSS, if we assume at most one fault, then the sensor faults are neither system nor 
trace diagnosable since we are only able to observe the difference in output of the sensors, 
and therefore we can never be sure of which sensor is experiencing the fault. Restricting 
the model to two faults, instead, makes it possible to detect when both sensors are faulty, 
since the device stops working. The Battery Leak is trace diagnosable but not system 
diagnosable. This means that in general, we cannot detect the battery leak, but there 
is at least one execution in which we can. In particular, this is the execution in which 
the mode becomes Secondary 2 when Battery 1 was charged, and we can see the battery 
discharging, thus detecting the fault. Note that to detect this fault, we need to recall the 
fact that previously the battery was charged, and therefore a simple diagnoser without 
memory would not be able to detect this fault. 

5. Validation and Verification of ASL^ Specifications 

Thanks to the formal characterization of ASL^, it is possible to apply formal methods for 
the validation and verification of a set of FDI requirements. In validation we verify that 
the requirements capture the interesting behaviors and exclude the spurious ones, before 
proceeding with the design of the diagnoser. In verification, we check that a candidate 
diagnoser fulfills a set of requirements. 

5.1. Validation. Given a specification A for our diagnoser, we want to make sure that 
it captures the designer expectations. Known techniques for requirements validation 
fe.g.. [CRST12] f include checking their consistency, and their realizability, i.e., whether they 
can be implemented on a given plant. Moreover, often we want to show that there exists 
some condition under which the alarm might be triggered ( possibility ), and some other 
conditions that require the alarm to be triggered ( necessity ). 

By construction, an ASLx specification is always consistent, i.e., there are no internal 
contradictions. This is due to the fact that alarm specifications do not interact with each 
other, and each alarm specification can always be satisfied by a diagnosable plant. Moreover, 
in Section O we will prove that we can always synthesize a diagnoser satisfying A, with 
the only assumption that if A contains some system diagnosable alarm condition, then that 
condition is diagnosable in the plant. Thus, the check for realizability reduces to checking 
that the plant is diagnosable for the system diagnosable conditions in A. The diagnosability 
check can be performed via epistemic model-checking (Section 14.2(1 or it can be reduced to 
an LTL model-checking problem using the twin-plant construction |CPC03j . 

An alarm that is always (or never) triggered is not useful. Therefore, we need to check 
under which conditions the alarm can and cannot be triggered. Moreover, there might be 
some assumptions on the environment of the diagnoser (including details on the plant) that 
might have an impact on the the alarms. For example, if we have a single fault assumption 
for our system, an alarm that implicitly depends on the occurrence of two faults will never be 
triggered. Similarly, our assumptions on the environment might provide some link between 
the behavior of different components, or dynamics of faults and thus characterize the relation 
between different alarms. 

We consider a set of environmental assumptions E expressed as LTL properties. This 
set can be empty, or include detailed information on the behavior of the environment and 
plant, since throughout the different phases of the development process, we have access to 
better versions of the plant model, and therefore the analysis can be refined. 
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When checking possibility we want that the alarms can be eventually activated, but 
also that they are not always active. This means that for a given alarm condition (p € A, 
we are interested in verifying that there is a trace er G E and a trace a' G E s.t. a \= F^A^ 
and a' j= P-yA^. This can be done by checking the unsatisfiability of (E A (p) —> G-yA^j 
and (E A p>) -A- G L A^_,. 

Checking necessity provides us a way to understand whether there is some correlation 
between alarms. This, in turns, makes it possible to simplify the model, or to guarantee 
some redundancy requirement. To check whether is a more general alarm than 
(subsumption) we check whether (E A ip A ip') —> G( L A V?J —> ^A^'j) is valid. An example of 
subsumption of alarms is given by the definition of maximality: any non-maximal alarm 
subsumes its corresponding maximal version. Finally, we can verify that two alarms are 
mutually exclusive by checking the validity of (E A ip A <p') -A- G-'( L A V , J A ^A^'j). 

To clarify the concepts presented in this section, we apply a necessity check on our 
running example. In the Battery-Sensor, we have two alarms specified on PSU 1 (Figure fl5l) : 
PSU\Exact i and PSUlsound- Let’s take i = C = 2, thus obtaining: 

- ExactDel k(PSU 1 Exact 2 , Ppsui , 2, T race, True ) 

- Bo\JND~DEL K (PSUl Bound , Ppsui, 2 , Trace, True ) 

we want to show that PSUlExacti is more specific than (is subsumed by) PSUlBound ■ This 
means that for any plant and diagnoser, the following holds: 

D O P 1= (VPSUl Exa ct 2 A ^PSUlBound) G L P SUl Exacts J 3 SUl B ouncb) 

By renaming with PE = PSUl Ex act 2 and PB = PSU Bound (for brevity) and expanding 
the definitions of p>psui Exact2 A T'psu\ Bound we have that 

D ® P |= (G( l PE j -a- Y 2 /3 ) A G(J<Y 2 ^ -a- l PP,) A 
G( l PP, -a 0- 2 /3) A G( l P0^ 2 /3 j -a l PB,)) 

G( l PP, -a l PPj) 

We can apply Theorem 14.21 and therefore write: 

D 0 P |= (G( l PP, -a y 2 /?) A G{ J<Y 2 (A -a l PP,) A 
G( l PB, -a 0- 2 /5) A G( l A'0- 2 /3 j -a l PP,) A 
G( l PP, -a l AW 2 /3J A G( l PP, -a JCO&PJ) 

-A G( l PPj -a l PPJ 

To prove that the above formula is valid (and therefore it is satisfied by any plant and 
diagnoser), we prove that its negation is unsatisfiable: 

(G( l pp, -a t 2 /3) a G( L py 2 A -a l ppj a 
G( l PB, -a 0- 2 /3) A G( l A'0- 2 /3, -a l P£J A 
G( l PP, -a l AT 2 A) A G( l PP_, -A j<o- 2 p_)) 

A-G( l PP j -a l PPj) 

The first part of this formula is composed by conjuncts in the form Gi/j. This means that a 
counter examples is a trace for which each state satisfies if). Moreover, we need one of these 
states to satisfy (PE A -yP-Bj). Therefore, to prove the unsatisfiable of the above formula, 
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we can just prove that no state exists that satisfies: 

(,PE, —t Y 2 fi) A (JCY 2 ^ -A L PPj A 
(l-P-Bj ->• 0- 2 /3) A (JxO- 2 ^ -a l ppj a 
( L PP, -A JCY 2 fif) A ( L PP_, -A L A'0- 2 /3,)) 

A l PPj A - 1 L PPj 

We show this by a contradiction since: 


ObsPoint Def. 
Theorem 14.21 on PE 
Maximality of PB 
fDef. of -iAT 
Def. of 0- n 
K Axiom (Kef ~t f) 


A l-P-Ej A “ 1 L PPj 

A l T_, APE A -i PB 

A l T, A PP A ^PP A ifhhfi 

A l T, A PP A ^PP A AWT/3 A ->AT>- 2 /3 

A l T, A PP A ^PP A KYY/3 A o- 2 /3 

A l t, a PP a ^PP A KYY/3 A -.(0 V y/3 v yy/3) 

a l t, a pp a ^pp a yy/3 a -./? a -y^ a -.yy^s 


Thus reaching a contradiction between yy/3 and —iYY/3. In the step marked with f we 
need to show that two observationally equivalent traces exists s.t. one satisfies 0- 2 j3 and 
the other -i 0- 2 /3; therefore, we only need to show that one of the two (namely —>0- 2 /3) 
does not exist. 

5.2. Verification. The verification of a system w.r.t. a specification can be performed via 
model-checking techniques using the semantics of the alarm conditions: 

Definition 5.1. Let D be a diagnoser for alarms A and plant P. We say that D satisfies 
a set A of ASLk specifications iff for each ip in Av there exists an alarm A^ € A and 
D ® P |= ip. 

To perform this verification steps, we need in general a model checker for KLi with asyn¬ 
chronous/synchronous perfect recall such as MCK |QM04| . However, if the specification 
falls in the pure LTL fragment (ASL) we can verify it with an LTL model-checker such as 
nuXmv [CCD + 14] thus benefiting from the efficiency of the tools in this area. 

Moreover, a diagnoser is required to be deterministic. This is important, on one hand, 
for implement ability, on the other hand, to ensure that the composition of the plant with 
the diagnoser does not reduce the behaviors of the plant. In order to verify that a given 
diagnoser D = (V, P, I, T) is deterministic, we check the following conditions: 

• I must be satisfiable, 

• I A I[V C /V\ -aV = V c must be valid, 

• for all e E P, VV3V'.T(e) must be valid (note that this corresponds to the validity of the 
pre-image of T), 

• for all e € P, T(e) A T(e)[V c /V'] -A V' = V c must be valid. 

Therefore, we can solve the problem with a finite set of satisfiability checks and pre-image 
computations. 
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6. Synthesis of a Diagnoser from an ASL^ Specification 

In this section, we discuss how to synthesize a diagnoser that satisfies a given specification 
A. We considers the most expressive case of ASL^' (maximal/trace diagnosable), which 
also satisfies all the other cases. 

The idea is to generate an automaton that encodes the set of possible states in which 
the plant could be after each observations. The result is achieved by generating the power- 
set of the states of the plant, also called belief states , and defining a suitable transition 
relation among the elements of this set, only taking into account observable information. 
Each belief state of the automaton is then annotated with the alarms that are satisfied in 
all the states of the belief state. The resulting automaton is the Diagnoser. 

The approach resembles the constructions by Sampath }SSL + 96] and Schumann [Sch04] , 
with the following main differences. First, we consider LTL Past expression as diagnosis 
condition, and not only fault events as done in previous works. Second, instead of providing 
a set of possible diagnoses, we provide alarms. In order to raise the alarm, we need to be 
certain that the alarm condition is satisfied for all possible diagnoses. This gives raise to a 
3-valued alarm system: we know that the fault occurred; know that the fault did not occur; 
or we are uncertain. Moreover, the approach works for the asynchronous case. Although 
the use of a power-set construction in the setting of temporal epistemic logic is not novel 
(e.g. |Dimf)9] for synchronous CTLK model-checking), the main contribution of this section 
is to show the formal properties of the diagnoser, and in particular that it satisfies the 
specification. In a way, this algorithm is a strong indicator of a deep connection between 
the topics of temporal epistemic logic reasoning and FDI design. 

6.1. Synthesis algorithm. Given a partially observable plant P = (V p , E p , I p , T P ,E P ), 
let S be the set of states of P. The belief automaton is defined as B(P) = (B, E, bo, R) 
where B = 2 s , E = E p , bo € B and R : (B x E) B. B represents the set of sets of states, 
also called belief states. Given a belief state 6, we use b* to represent the set of states that 
are reachable from b by only using events in E p \ E p (non observable events), and call it the 
u-transitive closure. Formally, b* is the least set s.t. b C b* and if there exist e £ E p \ E p 
and s' £ b* such that {s',s) £ T p {e) then s £ b*. bo is the initial belief state and contains 
the states that satisfy the initial condition I p (i.e., bo = {s | s \= I p }). 

Given a belief state b and an observable event e £ E p , we define the successor belief 
state b' as: 

R(b,e) = b' = {s' | 3s € b*. (a, s') |= T p {e)} 

that is the set of states that are compatible with the observable event e in a state of the 
u-transitive closure of b. Intuitively, we first compute the u-transitive closure of b to account 
for all non-observable transitions, and then we consider all the different states that can be 
reached from b* with an occurrence of the event e. 

The diagnoser is obtained by annotating each state of the belief automaton with the 
corresponding alarms. We annotate with all the states b that satisfy the temporal 
property r((p). As explained later on, any temporal r(ip) can be handled by introducing 
suitable propositional formulas. Therefore we consider the simplest case in which r(ip) is 
a propositional formula and formally say that the annotation a& of the belief state b is the 
assignment to A v such that ab(A tp ) is true iff for all s £ b, s \= r(ip). We perform the 
same annotation for A-,^. The diagnoser obtained by this algorithm induces three alarms, 
related to the knowledge of the diagnoser. In particular, the diagnoser can be sure that 
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function belief_automaton(I, T, E, E q ) 
visited <— {} 
edges <— {} 
stack A- [/] 

while not stack.is-empty () do 
b 4— stack.popQ 
b* <r- uJtransjclosureib , T, E) 
for all o € get-observable_events(b* ,T, Eo) do 
targetJbelief <— reachable_w_obs(b*,o,T) 
edges.add((b, o, target-belief)) 
if target-belief visited then 
visited.addftar get-belief) 
stack.push(tar get-belief) 
end if 
end for 
end while 

return Automaton{visited, edges) 

end function 

Figure 17. Pseudo-code of the Belief Automaton construction phase 

a condition occurred (A v ) can be sure that a condition did not occur (A-,^) or can be 
uncertain on whether the condition occurred (-'A ip A - notice that, by construction, 

it is not possible for both A v and A~, v to be true at the same time. In this way, at any 
point in time we are able to understand whether we are on a trace that is not diagnosable 
(and thus there is uncertainty) or whether the diagnoser knows that the condition did not 
occur. This can thus provide additional insight on the behavior of the system. 

Figure [T7] provides a pseudo-code of the main function of the synthesis task: the con¬ 
struction of the belief automaton. Starting from the set of initial states, we perform an 
explicit visit until we have explored all belief states. For each belief state we first compute 
its u-transitive closure {u-trans-closure) w.r.t. the non-observable events E, obtaining b*. 
We then compute the possible observable events available from b*, and iterate over each 
event o* obtaining the set of states target-belief such that T(bstar, Oj, target-belief) is sat¬ 
isfied ( reachable-W-obs). We can now add a transition to our automaton linking the belief 
state b to the belief state target-belief through the event Oj. Once we have completed this 
phase, we have an automaton with labeled transitions. The automaton resulting from this 
function can then be annotated by visiting each state and testing whether the state entails 
(or not) the alarm specification. 

6.2. Running Example. We show the first step of the algorithm on a simplified version 
of the battery component of our running example (Figure [5]). We ignore the events related 
to threshold passing of the battery (Mid, Low, High) and only consider the observable event 
Off, signaled when the charge reaches zero, and the ones due to mode changes. To keep 
the representation compact, we indicate each state with three symbols. For example, we 
use ( NPC) to indicate the state “Nominal, Primary, Charging” and ( NPC) to indicate 
the state “Nominal, Primary, Not Charging”. Similarly we use F, O, and D to indicate 
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Faulty, Offline and Double. We recall that in the original model, the mode transitions are 
observable but all other transitions are not. 

In the first step (Figure fT8l) . we take the set of initial states. This is the set of states 
( NPC) for any value of the charge £ [0,(7]. The u-transitive closure needs to take into 
account all non-observable transitions. Therefore, we need to consider going from Nominal 
to Faulty, from Charging to Not Charging, and their combination. 



Figure 18. Expanding the initial belief state of the battery LTS. 


These are all the states that are reachable before an observable event can occur. We 
now take each observable event and compute the set of states that are reachable with one 
of the observable events (Figure fT9l): the battery being discharge (Off), and the change 
of mode ( Offline , Double). Note that one of the belief states is smaller than the others. 




Figure 19. Expanding the belief state via observable transitions 


This is due to the fact that in our model, the discharging of the battery cannot occur if 
the battery is nominal, charging and in primary mode (NPC). Thus, the fact that we 
receive the Off event allows us to exclude that state. The state obtained by computing 
the transitive closure is not part of our final automaton, and is provided in the figure only 
to simplify the understanding. 

We repeat these two steps until all belief states have been explored. We then pro¬ 
ceed to the labeling phase, in which we label each state with the corresponding alarm. 
For example, by considering the alarms ExactDel^tvcu Nominal A Charging, 0) and 
ExactDel^tv, Nominal, 0), we obtain the diagnoser partially represented in Figure [20] 
Notice how, in the initial state we can raise the alarm Aj^c, and this alarm can only be 
changed by an observable transition. 


6.3. Formal Properties of the Synthesized diagnoser. We now show that the gener¬ 
ated transition system is a diagnoser and that it is correct, complete and maximal. Lets 
assume that <p is an exact delay specification, with delay zero. Any other alarm conditions 
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can be reduced to this case. We build a new plant P' by adding a monitor variable T to P 
s.t., P' = P x {G{r(ip) -fA r)), where we abuse notation to indicate the synchronous compo¬ 
sition of the plant with an automaton that encodes the monitor variable. By rewriting the 
alarm condition as iff = ExactDel( j 4 ¥ ,, r, 0), we obtain that D (8> P \= f> iff D (8> P' (= ip'. 
Thus, it is sufficient to show the following results only for the zero delay case. We define 
Dy as the diagnoser for ip. D v = (y D ' p ,E D ‘ p ,I Dip ,T D ' p ) is a symbolic representation of 
B(P) with Ay C V Dv , Eo v = E° and such that every state b of D ^ represents a state in 
B (with abuse of notation we do not distinguish between the two since the assignment to 
A v is determined by b). 

Theorem 6.1. D^ is deterministic. 

Proof. The result follows directly from the definition of the belief automaton, which is 
deterministic (one initial state and one successor). Note that the assignment to A^ is not 
relevant since it is determined by the belief state. □ 

Lemma 6.2. For every reachable state b x s of (g> P, for every trace a reaching b x s, 
for every state s' £ b, there exists a trace a' reaching b x s’ with obs(a) = obs(cr'). 

Proof. By induction on a. All traces are observationally equivalent in the initial state. Let 
{bi x si,e,b x s) be the last transition of a and let o\ be the prefix of a without this last 
transition. If e £ E \ E 0 then obs(a) = obs(a i). Otherwise, for every state s' £ b there 
exists a transition (s^e, s') such that s) £ b*. By inductive hypothesis there exists a trace 
a[ reaching fei x such that obs(ai) = obs(a[). Therefore the concatenation of a[ with the 
transition (bi x s[,e, b x s') results in a trace cr' reaching bx s' such that obs(a ) = obs(a'). □ 

Theorem 6.3 (Maximality). D v <g) P \= G( L AV(y?)_, —> ^A^f). 

Proof. Consider a trace a and i > 0. If a, i \= L AV(<^)_, , then for all traces a' and points 
j s.t. ObsEq((a,i), (a 1 , j)), a',j \= r(ip). By Lemma IC2l all states s £ o[i\ there exists a 
trace a’ with obs(a) = obs(a '), and therefore s \= r(ip) so that o[i\ \= ^A^. □ 

Lemma 6.4. Given a trace a of D^ ® P. Let a[i] = b x s. If i is an observation point, 
then s £ b. 

Proof. By assumption, i is the n-th observation point of a for some n. We prove the lemma 
by induction on n. 
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Consider the case n = 1. If <r[0] = bo x so, by construction of Dp, so £ ho- Let 
a[i — 1] = b' x s' and let e be the z-th (observable) event of a. If i is the first observation 
point of a, it means that b' = bo and s' £ 6 q. Moreover, (s', s) £ T(e) and therefore s £ b. 

Consider the case n > 1. Let j be the n — 1 observation point, a\j\ = bj x Sj, a[i — 1] = 
b’ x s' and let e be the z-th (observable) event of a. Similarly to the previous case, b' = bj 
and s' £ bj. Moreover (s', s) £ T(e) and therefore s £ b. □ 

Theorem 6.5 (Correctness). Dp < 8 > P |= G( l A</?_i —> r(ip)). 

Proof. Consider a trace a and z > 0. Suppose a, i [= L Ap_, and let an and up be respectively 
the left and right component of a. Then, for all s £ <rn [z], s |= r(tp). Since z is an 
observation point, by Lemma 16.41 crp[z] £ <74 [ A- We can conclude that <r[z] |= r(ip). □ 

Theorem 6.6 (Completeness). If ip is an alarm condition required to be trace diagnosable, 
then D v is complete. If ip is a system diagnosable condition and (p is diagnosable in P, then 
Dp is complete. 

Proof. Since Dp is maximal and correct (Theorems 16.31 and 16.51) . we can apply Theorem 14.31 
(if ip is trace diagnosable) or Theorem l4.4l (if it is system diagnosable) to obtain completeness. 

□ 


7. Industrial Experience 

The methods described in this paper have been motivated by AUTOGEF, a project |Eurl0l 
lAUTl lANY + 12] funded by the European Space Agency. The main goal of the project 
was the definition of a set of requirements for an on-board Fault Detection, Identification 
and Recovery (FDIR) component and its synthesis. The problem was cast in the frame 
of discrete event systems, communicating asynchronously, and tackled by synthesizing the 
Fault Detection (FDI) and Fault Recovery (FR) components separately - with the idea that 
the FDI provides sufficient diagnosis information for the FR to act on. 

A similar problem was further investigated in FAME, another ESA-funded 
project [Eurlll IFAM1 lGFB + 14l lBBC + 14al lBBC + 14b] . In the context of FAME, we ad¬ 
dressed the problem of synthesis of FDI and FR components for continuous time systems, 
with synchronous communication - in particular the diagnoser communicates with the plant 
by sampling the values of the sensors at periodic time intervals. In both cases, AUTOGEF 
and FAME, we addressed the problem of FiniteDel diagnosis, which was of interest from 
an industrial perspective. 

Within AUTOGEF, the design approach initially was evaluated using scalable bench¬ 
mark examples. Then, Thales Alenia Space evaluated AUTOGEF on an industrial case 
study based on the EXOMARS Trace Gas Orbiter. This case-study is a significant appli¬ 
cation of the approach described in this paper, since it covers all the phases of the FDIR 
development process. The (nominal and faulty) behavior of the system was modeled using 
a formal language. A table-based and pattern-based approach was adopted to describe the 
mission phases/modes and the observability characteristics of the system. The specification 
of FDIR requirements by means of patterns greatly simplified the accessibility of the tool 
to engineers that were not experts in formal methods. Alarms were specified in the case 
of finite delay, under the assumption of trace diagnosability and maximality of the diag¬ 
noser. Different faults and alarms were associated with specific mission phases/modes and 
configurations of the system, which enabled generation of specific alarms (and recoveries) 
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for each configuration. The specification was validated, by performing diagnosability anal¬ 
ysis on the system model. The synthesis routines were run on a system composed of 11 
components, with 10 faults in total, and overall 90 bits of variables, and generated an FDI 
component with 754 states. Finally, the correctness of the diagnoser was verified by using 
model-checking routines. Synthesis and verification capabilities have been implemented on 
top of the nuXmv model checker. We remark that the ability to define trace diagnosable 
alarms was crucial for the synthesis of the diagnoser, since most of the modeled faults were 
not system diagnosable. 

A similar approach was undertaken in FAME. The industrial evaluation was carried 
out on a further elaboration of the Trace Gas Orbiter case study, adapted to take into 
account timings of fault propagation. The specification of the FDIR requirements and the 
verification, validation and synthesis process were done in a similar way. As a difference 
with AUTOGEF, the synthesis of FDI in FAME was aided by the specification of a fault 
propagation model, in the form of a Timed Failure Propagation Graph (TFPG) [BBC + 14a 
IBCGM15] . The case study investigated fault management related to the feared event ‘loss 
of the spacecraft attitude’. A total of 3 faults, instantiated for two (redundant) instances 
of the Inertial Management Unit (IMU) component were considered. The synthesis of FDI 
produced an FDI component with 2413 states. 

Successful completion of both projects, and positive evaluations from the industrial 
partner and ESA, suggest that a significant first step towards a formal model-based design 
process for FDIR was achieved. 


8. Related Work 

8.1. From Synchronous to Asynchronous FDI. This work is closely related 
to |BCGT14j . The key difference is that we extended the approach to include the asyn¬ 
chronous composition of the plant with the diagnoser. This extension is useful in practice, 
since many real-life systems as well as many high-level modeling languages adopt an asyn¬ 
chronous, event-based view. In the synchronous case system and diagnoser share the same 
time scale, and the diagnoser takes a step every time the system does. In the asynchronous 
setting, on the other hand, the diagnoser takes a step only when the system exhibits an 
observable behavior, (i.e., an observable event). 

Although this could be seen as a minor difference, it poses nontrivial problems. First 
of all, since the diagnoser cannot update the value of the alarms at every point in time, 
we need to restrict the definition of Correctness and Completeness to the occurrence of a 
synchronization, in which the diagnoser can update the alarms, by introducing observation 
points and using the observed version L A_, of A. Similarly, since the diagnoser can update 
its knowledge of the plant only during synchronizations, also the epistemic operator is 
considered in the observation points. Therefore, we define K as usual, but then introduce 
a stronger version J\ A that is the basis for most of our definitions. 

The synthesis algorithm also needs to take into account multiple transitions from 
the plant that are executed without synchronization. This is done by introducing the 
u-transitive closure of the belief states. 

Finally, to keep the formalism simple, we modeled the observability of state variables 
as observable events. This is mainly due to the fact that a change in observable state vari¬ 
ables requires the introduction of a new synchronization event between the plant and the 
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diagnoser in order to allow the diagnoser to update its knowledge. This idea is consistent 
with the approach defined in jSSL + 96] , Also, in other works on knowledge in an asynchro¬ 
nous setting (e.g., |vdM07j ). the fact that the observer sees every observable state changes 
implicitly assumes that the observable state change triggers a synchronization. Note that 
this is somehow different from asynchronous systems with shared variables, where a process 
can see the change of the shared variable only when/if scheduled. 

We notice that the synchronous case can be embedded in the asynchronous one. In 
fact, according to Def. 12.21 a synchronous product is obtained by making all events of the 
plant observable: E p = E p . This implies that all points are observation points. Therefore, 
l A_, = A, and the restriction of K to observation points has no effect. Also the u-transitive 
closure has no effect, and we see that b* = b. 

8.2. FDI Specification. In order to formally verify the effectiveness of an FDI component 
as part of an overall fault-management strategy, both a formal model of the FDI component 
(e.g., as an automaton) and of its expected behavior (requirements) is required. Contrary 
to works related to diagnosis compilation, we are also interested in verifying that an FDI 
satisfies a given specification. This has tremendous value when we consider the problem of 
checking whether an existing system (that is familiar to the system designer) satisfies the 
specification and thus is functionally equivalent to an automatically synthesized one (that 
could be complex and hard to understand). 

Previous works on formal FDI development have considered the specification and syn¬ 
thesis in isolation. Our approach differs with the state of the art because we provide a 
comprehensive view on the problem. Due to the lack of specification formalism for diag¬ 
noses, the problem of verifying their correctness, completeness and maximality was, to the 
best of our knowledge, unexplored. 

Concerning specification and synthesis, [JKOlj is close to our work. The authors present 
a way to specify the diagnoser using LTL properties, and present a synthesis algorithm 
for this specification. However, problems such as maximality and trace diagnosability are 
not taken into account. Another remarkable difference is that (JK0P| considers diagnosis 
conditions with future operators. This enables the definition of alarms that predict the 
occurrence of an event (i.e. prognosis), that is currently not captured in our work. 

8.3. Diagnosability. In many practical situations it is not possible to require system di¬ 
agnosability, due, for example, to critical pairs that exists only in a particular configuration 
of the system. We introduce the concept of trace diagnosability, that is a distinguishing 
feature of our approach, and overcomes a strong limitation in the current state-of-the-art. 

The idea of using epistemic properties to analyze the diagnosability of a system had 
been alreadv proposed in [ELMVllj and [Hua w- Notably, the latter extends the problem 
to a probabilistic setting, and draws a link with the classical definition of diagnosability, 
introducing the idea of L-diagnosability (that is equivalent to our finite-delay diagnosability). 
Our approach extends these works by considering other types of delay and the problem of 
trace diagnosability. Moreover, we do not focus only on the diagnosability problem, but 
also provide a way of specifying the diagnoser and characterize its completeness in terms of 
epistemic temporal logic. 
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We extend the results on diagnosability checking from jCPCO.lj in order to provide an 
alternative way of checking diagnosability and redefine the concept of diagnosability at the 
trace level. 

8.4. Runtime Verification. The main difference between diagnosis and runtime verifica¬ 
tion is the partial observability of the plant. Works on runtime verification assume |HR04j 
that the properties to be verified are expressed over observable variables of the system. In 
diagnosis, instead, we define the properties over non-observable parts of the system and 
then ask whether it is possible to infer them by looking at the observable part of the sys¬ 
tem. Therefore, while some approaches for runtime verification do not need a model of the 
system (i.e., black-box approach), in diagnosis we need to have some information about the 
behavior of the system. Finally, in [BLSllj the authors propose the use of a three-valued 
LTL variant to define whether a trace satisfies a property, does not satisfy it or whether 
there is not enough information to come to a conclusion. This might resemble the approach 
presented in Section [6] by our synthesis algorithm. However, the difference is substantial. 
Every time our diagnoser is uncertain, it means that there are two traces o\ and a 2 that are 
observationally equivalent, but one satisfies the property and the other does not. However, 
if we could have an oracle that would tell us whether the system is in g\ or in 02 , we could 
state (without uncertainty) whether the property is satisfied or not. In [BLSllj instead, 
the inconclusiveness of the monitor is intrinsic in the fact that the given trace does neither 
satisfy nor violate the property. 

9. Conclusions and Future Work 

This paper presents a formal approach for the design of FDI components, that covers many 
practically-relevant issues such as delays, non-diagnosability and maximality. The design is 
based on a formal semantics provided by temporal epistemic logic and can be used both in 
a synchronous and asynchronous setting. We cover the specification, validation, verification 
and synthesis steps of the FDI design, and discuss the applicability of the approach on a 
case-study from aerospace. To the best of our knowledge, this is the first work that provides 
a formal and unified view to all the phases of FDI design. 

In the future, we plan to explore the following research directions. First, we will ex¬ 
tend FDI to deal with infinite-state systems. Secondly, we will experiment with different 
assumptions on the memory requirements for the diagnoser, i.e., relax the perfect recall 
assumption. 

Another interesting line of research is the development of optimized reasoning tech¬ 
niques for temporal epistemic logic. The idea is to consider the fragment that we are using, 
both for verification and validation, and to evaluate and improve the scalability of the 
synthesis algorithms. 

Finally, we will work on integrating the FDI component with the recovery procedures. 
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